Take an already tidy data set and filter for times of the year where daily variation is >x (user designated range)
For all days that have that amount of vairability, calculate average of preceeding 2 weeks
calculate average of following 1, 2, 3, 4, 5, 6, 7, 8, 9, ……n days
if avg is >preceeding 2 weeks by Y amt, designate it as “use”, if not, designate as “toss”
filter for usable days after dates with high variability
Generate pretty plot of the days
## Parsed with column specification:
## cols(
## year = col_double(),
## month = col_double(),
## day = col_double(),
## hour = col_double(),
## wspeed = col_double(),
## wdir = col_double(),
## water_temp = col_double(),
## wvht = col_double(),
## wvdpd = col_double(),
## wvapd = col_double(),
## buoy = col_character()
## )
# First pas plot to look at SST across years
ggplot(sst, aes(x = water_temp, y = as.factor(month)))+
geom_density_ridges()+ # From the ggridges package
ylab("Year") +
xlab("Daily Fluctations (°C)")+
theme_bw()
https://allisonhorst.shinyapps.io/missingexplorer/#section-introduction
There are quite a lot of missing years in the data set. The naniar package can tell me the proportion of missing data relative to the entire data set
## # A tibble: 17 x 3
## variable n_miss pct_miss
## <chr> <int> <dbl>
## 1 wvdpd 85669 20.1
## 2 wvht 85575 20.1
## 3 wvapd 85571 20.1
## 4 water_temp 53309 12.5
## 5 wdir 53264 12.5
## 6 wspeed 52180 12.2
## 7 year 0 0
## 8 month 0 0
## 9 day 0 0
## 10 hour 0 0
## 11 buoy 0 0
## 12 year1 0 0
## 13 month1 0 0
## 14 day1 0 0
## 15 hour1 0 0
## 16 date_time 0 0
## 17 date 0 0
There is a lot of missing data for entire years, and I am not sure how to deal with this, especially since most of the missing data is in the more recent years.
Questions for group: 1) How to graph variability? Do you like the idea of doing max daily var across the year? 2) What to do when the recent data is missing? I.e. if you had an experiment in 2020, but there are large holes in the data after 2016?
To make a daily variability plot, I will just take the latest full year of data (2016) and make a nicer version of that plot
I’m interested in how variability from 1994-1996 compares to 2015-2017. Going to filter the data for a few years early in the data set to visualize how to “smoothed” data compare to the more recent years. Not feeling super confident in this approach, but thought it could be cool to look at.